lzone-cheat-sheets/Cheat Sheets/Monitoring/Prometheus.md at master · lwindolf/lzone-cheat-sheets

Enable Scraping in Kubernetes

Scraping is either enabled globally or explicitly.

For explicit scraping add the following annotation for Pods or services.

---
apiVersion: v1
kind: Service
metadata:
   name: my-service
   annotations:
     prometheus.io/scrape: "true"
     prometheus.io/path: "/metrics"            # optional
     prometheus.io/port: "9102"

Note that for Daemonsets you have to put the annotation in the template spec:

---
apiVersion: apps/v1beta2
kind: DaemonSet
spec:
  [...]
  template:
    metadata:
      [...]
      annotations:
        prometheus.io/scrape: 'true'
        prometheus.io/port: '9102'

Query Syntax

From https://prometheus.io/docs/prometheus/latest/querying/examples/

Simple metric and maching by label

http_requests_total
http_requests_total{job="apiserver", handler="/api/comments"}

Range query for 5min

http_requests_total{job="apiserver", handler="/api/comments"}[5m]

Pattern matching with regex

http_requests_total{job=~".*server"}
http_requests_total{status!~"4.."}

Note that when using regex pattern you have to think of them being enclosed with ^ and $ markers. So if you want to match abc in the middle of a label value you need to specify .*abc.*!

Aggregate and group by

sum(rate(http_requests_total[5m])) by (job)
sum by(a,b)(mymetric{field="value")

Math expressions

(instance_memory_limit_bytes - instance_memory_usage_bytes) / 1024 / 1024

Top keys

topk(3, sum(rate(instance_cpu_time_ns[5m])) by (app, proc))

Count and group

count(instance_cpu_time_ns) by (app)

Default a count returning "no value" to 0

count(some_metric) or vector(0)

Find absent metrics (e.g. for alerting on missing metrics)

absent(some_metric{label1="value1", label2="value2"})
absent_over_time(some_metric{label1="value1", label2="value2"}[15m])

If you cannot specify an absent expression because you cannot specify the dynamic labels you can follow this approach using offset and unless:

max_over_time( your_metric[4h] ) offset 1h
  unless your_metric

On the fly series label replacing: create namespace2 from .* match on label namespace:

label_replace(http_requests_total, "namespace2", "$1", "namespace", "(.*)")

A good cheat sheet is https://promlabs.com/promql-cheat-sheet/

Find metric syntax errors

Prometheus is very strict when a single syntax error occurs on a metric endpoint and then ignores the entire endpoint. To check which line causes the error run promtool check metrics. promtool is included in your prometheus pod. It also is a static binary you can just copy anywhere else to use:

curl -s localhost:8080/metrics | promtool check metrics

Relabeling Metrics

See https://last9.io/blog/mastering-prometheus-relabeling-a-comprehensive-guide/

Backfill Recording Rules

If you create new recording rules it is possible to compute them for older data too: https://github.com/prometheus/prometheus/blob/main/docs/storage.md#backfilling-for-recording-rules

Problems of Prometheus

Issues with quantile calculation

Scaling Prometheus

Self-hosted TSDB

SaaS

GrafanaCloud
Influx

Infrastructure Monitoring

Blackbox Exporter (allows defining TCP, HTTP... checks for non-k8s endpoints)
Adding many Blackbox Exports
Running multi-target exporters
Mixins for different tools
Grafana k8s Cost Report

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable Scraping in Kubernetes

Query Syntax

Find metric syntax errors

Relabeling Metrics

Backfill Recording Rules

Problems of Prometheus

Scaling Prometheus

Self-hosted TSDB

SaaS

Infrastructure Monitoring

FilesExpand file tree

Prometheus.md

Latest commit

History

Prometheus.md

File metadata and controls

Enable Scraping in Kubernetes

Query Syntax

Find metric syntax errors

Relabeling Metrics

Backfill Recording Rules

Problems of Prometheus

Scaling Prometheus

Self-hosted TSDB

SaaS

Infrastructure Monitoring